{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set tf 1.x for colab\n",
    "%tensorflow_version 1.x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Generating names with recurrent neural networks\n",
    "\n",
    "This time you'll find yourself delving into the heart (and other intestines) of recurrent neural networks on a class of toy problems.\n",
    "\n",
    "Struggle to find a name for the variable? Let's see how you'll come up with a name for your son/daughter. Surely no human has expertize over what is a good child name, so let us train RNN instead;\n",
    "\n",
    "It's dangerous to go alone, take these:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:26:42.696201Z",
     "start_time": "2018-08-13T20:26:38.104103Z"
    }
   },
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "print(tf.__version__)\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "import os\n",
    "import sys\n",
    "sys.path.append(\"..\")\n",
    "import keras_utils\n",
    "import tqdm_utils"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Load data\n",
    "The dataset contains ~8k earthling names from different cultures, all in latin transcript.\n",
    "\n",
    "This notebook has been designed so as to allow you to quickly swap names for something similar: deep learning article titles, IKEA furniture, pokemon names, etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:26:42.701832Z",
     "start_time": "2018-08-13T20:26:42.697766Z"
    }
   },
   "outputs": [],
   "source": [
    "start_token = \" \"  # so that the network knows that we're generating a first token\n",
    "\n",
    "# this is the token for padding,\n",
    "# we will add fake pad token at the end of names \n",
    "# to make them of equal size for further batching\n",
    "pad_token = \"#\"\n",
    "\n",
    "with open(\"names\") as f:\n",
    "    names = f.read()[:-1].split('\\n')\n",
    "    names = [start_token + name for name in names]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:26:42.707885Z",
     "start_time": "2018-08-13T20:26:42.703302Z"
    }
   },
   "outputs": [],
   "source": [
    "print('number of samples:', len(names))\n",
    "for x in names[::1000]:\n",
    "    print(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:26:42.857411Z",
     "start_time": "2018-08-13T20:26:42.709371Z"
    }
   },
   "outputs": [],
   "source": [
    "MAX_LENGTH = max(map(len, names))\n",
    "print(\"max length:\", MAX_LENGTH)\n",
    "\n",
    "plt.title('Sequence length distribution')\n",
    "plt.hist(list(map(len, names)), bins=25);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Text processing\n",
    "\n",
    "First we need to collect a \"vocabulary\" of all unique tokens i.e. unique characters. We can then encode inputs as a sequence of character ids."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:26:42.864592Z",
     "start_time": "2018-08-13T20:26:42.858725Z"
    }
   },
   "outputs": [],
   "source": [
    "tokens = ### YOUR CODE HERE: all unique characters go here, padding included!\n",
    "\n",
    "tokens = list(tokens)\n",
    "n_tokens = len(tokens)\n",
    "print ('n_tokens:', n_tokens)\n",
    "\n",
    "assert 50 < n_tokens < 60"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Cast everything from symbols into identifiers\n",
    "\n",
    "Tensorflow string manipulation is a bit tricky, so we'll work around it. \n",
    "We'll feed our recurrent neural network with ids of characters from our dictionary.\n",
    "\n",
    "To create such dictionary, let's assign `token_to_id`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:26:42.870330Z",
     "start_time": "2018-08-13T20:26:42.866135Z"
    }
   },
   "outputs": [],
   "source": [
    "token_to_id = ### YOUR CODE HERE: create a dictionary of {symbol -> its  index in tokens}\n",
    "\n",
    "assert len(tokens) == len(token_to_id), \"dictionaries must have same size\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:26:42.875943Z",
     "start_time": "2018-08-13T20:26:42.871834Z"
    }
   },
   "outputs": [],
   "source": [
    "def to_matrix(names, max_len=None, pad=token_to_id[pad_token], dtype=np.int32):\n",
    "    \"\"\"Casts a list of names into rnn-digestable padded matrix\"\"\"\n",
    "    \n",
    "    max_len = max_len or max(map(len, names))\n",
    "    names_ix = np.zeros([len(names), max_len], dtype) + pad\n",
    "\n",
    "    for i in range(len(names)):\n",
    "        name_ix = list(map(token_to_id.get, names[i]))\n",
    "        names_ix[i, :len(name_ix)] = name_ix\n",
    "\n",
    "    return names_ix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:26:42.883107Z",
     "start_time": "2018-08-13T20:26:42.877186Z"
    }
   },
   "outputs": [],
   "source": [
    "# Example: cast 4 random names to padded matrices (so that we can easily batch them)\n",
    "print('\\n'.join(names[::2000]))\n",
    "print(to_matrix(names[::2000]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Defining a recurrent neural network\n",
    "\n",
    "We can rewrite recurrent neural network as a consecutive application of dense layer to input $x_t$ and previous rnn state $h_t$. This is exactly what we're gonna do now.\n",
    "<img src=\"./rnn.png\" width=600>\n",
    "\n",
    "Since we're training a language model, there should also be:\n",
    "* An embedding layer that converts character id x_t to a vector.\n",
    "* An output layer that predicts probabilities of next phoneme based on h_t+1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:26:44.039419Z",
     "start_time": "2018-08-13T20:26:42.884581Z"
    }
   },
   "outputs": [],
   "source": [
    "# remember to reset your session if you change your graph!\n",
    "s = keras_utils.reset_tf_session()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:26:44.044903Z",
     "start_time": "2018-08-13T20:26:44.041084Z"
    }
   },
   "outputs": [],
   "source": [
    "import keras\n",
    "from keras.layers import concatenate, Dense, Embedding\n",
    "\n",
    "rnn_num_units = 64  # size of hidden state\n",
    "embedding_size = 16  # for characters\n",
    "\n",
    "# Let's create layers for our recurrent network\n",
    "# Note: we create layers but we don't \"apply\" them yet (this is a \"functional API\" of Keras)\n",
    "# Note: set the correct activation (from keras.activations) to Dense layers!\n",
    "\n",
    "# an embedding layer that converts character ids into embeddings\n",
    "embed_x = Embedding(n_tokens, embedding_size)\n",
    "\n",
    "# a dense layer that maps input and previous state to new hidden state, [x_t,h_t]->h_t+1\n",
    "get_h_next = ### YOUR CODE HERE\n",
    "\n",
    "# a dense layer that maps current hidden state to probabilities of characters [h_t+1]->P(x_t+1|h_t+1)\n",
    "get_probas = ### YOUR CODE HERE "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will generate names character by character starting with `start_token`:\n",
    "\n",
    "<img src=\"./char-nn.png\" width=600>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:26:44.053212Z",
     "start_time": "2018-08-13T20:26:44.048389Z"
    }
   },
   "outputs": [],
   "source": [
    "def rnn_one_step(x_t, h_t):\n",
    "    \"\"\"\n",
    "    Recurrent neural network step that produces \n",
    "    probabilities for next token x_t+1 and next state h_t+1\n",
    "    given current input x_t and previous state h_t.\n",
    "    We'll call this method repeatedly to produce the whole sequence.\n",
    "    \n",
    "    You're supposed to \"apply\" above layers to produce new tensors.\n",
    "    Follow inline instructions to complete the function.\n",
    "    \"\"\"\n",
    "    # convert character id into embedding\n",
    "    x_t_emb = embed_x(tf.reshape(x_t, [-1, 1]))[:, 0]\n",
    "    \n",
    "    # concatenate x_t embedding and previous h_t state\n",
    "    x_and_h = ### YOUR CODE HERE\n",
    "    \n",
    "    # compute next state given x_and_h\n",
    "    h_next = ### YOUR CODE HERE\n",
    "    \n",
    "    # get probabilities for language model P(x_next|h_next)\n",
    "    output_probas = ### YOUR CODE HERE\n",
    "    \n",
    "    return output_probas, h_next"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# RNN: loop\n",
    "\n",
    "Once `rnn_one_step` is ready, let's apply it in a loop over name characters to get predictions.\n",
    "\n",
    "Let's assume that all names are at most length-16 for now, so we can simply iterate over them in a for loop.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:26:44.342948Z",
     "start_time": "2018-08-13T20:26:44.056136Z"
    }
   },
   "outputs": [],
   "source": [
    "input_sequence = tf.placeholder(tf.int32, (None, MAX_LENGTH))  # batch of token ids\n",
    "batch_size = tf.shape(input_sequence)[0]\n",
    "\n",
    "predicted_probas = []\n",
    "h_prev = tf.zeros([batch_size, rnn_num_units])  # initial hidden state\n",
    "\n",
    "for t in range(MAX_LENGTH):\n",
    "    x_t = input_sequence[:, t]  # column t\n",
    "    probas_next, h_next = rnn_one_step(x_t, h_prev)\n",
    "    \n",
    "    h_prev = h_next\n",
    "    predicted_probas.append(probas_next)\n",
    "    \n",
    "# combine predicted_probas into [batch, time, n_tokens] tensor\n",
    "predicted_probas = tf.transpose(tf.stack(predicted_probas), [1, 0, 2])\n",
    "\n",
    "# next to last token prediction is not needed\n",
    "predicted_probas = predicted_probas[:, :-1, :]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# RNN: loss and gradients\n",
    "\n",
    "Let's gather a matrix of predictions for $P(x_{next}|h)$ and the corresponding correct answers.\n",
    "\n",
    "We will flatten our matrices to shape [None, n_tokens] to make it easier.\n",
    "\n",
    "Our network can then be trained by minimizing crossentropy between predicted probabilities and those answers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:26:44.354310Z",
     "start_time": "2018-08-13T20:26:44.344648Z"
    }
   },
   "outputs": [],
   "source": [
    "# flatten predictions to [batch*time, n_tokens]\n",
    "predictions_matrix = tf.reshape(predicted_probas, [-1, n_tokens])\n",
    "\n",
    "# flatten answers (next tokens) and one-hot encode them\n",
    "answers_matrix = tf.one_hot(tf.reshape(input_sequence[:, 1:], [-1]), n_tokens)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Usually it's a good idea to ignore gradients of loss for padding token predictions.\n",
    "\n",
    "Because we don't care about further prediction after the pad_token is predicted for the first time, so it doesn't make sense to punish our network after the pad_token is predicted.\n",
    "\n",
    "For simplicity you can ignore this comment, it's up to you."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:26:45.076642Z",
     "start_time": "2018-08-13T20:26:44.355594Z"
    }
   },
   "outputs": [],
   "source": [
    "# Define the loss as categorical cross-entropy (e.g. from keras.losses).\n",
    "# Mind that predictions are probabilities and NOT logits!\n",
    "# Remember to apply tf.reduce_mean to get a scalar loss!\n",
    "loss = ### YOUR CODE HERE\n",
    "\n",
    "optimize = tf.train.AdamOptimizer().minimize(loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# RNN: training"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:26:55.322187Z",
     "start_time": "2018-08-13T20:26:45.078296Z"
    }
   },
   "outputs": [],
   "source": [
    "from IPython.display import clear_output\n",
    "from random import sample\n",
    "\n",
    "s.run(tf.global_variables_initializer())\n",
    "\n",
    "batch_size = 32\n",
    "history = []\n",
    "\n",
    "for i in range(1000):\n",
    "    batch = to_matrix(sample(names, batch_size), max_len=MAX_LENGTH)\n",
    "    loss_i, _ = s.run([loss, optimize], {input_sequence: batch})\n",
    "    \n",
    "    history.append(loss_i)\n",
    "    \n",
    "    if (i + 1) % 100 == 0:\n",
    "        clear_output(True)\n",
    "        plt.plot(history, label='loss')\n",
    "        plt.legend()\n",
    "        plt.show()\n",
    "\n",
    "assert np.mean(history[:10]) > np.mean(history[-10:]), \"RNN didn't converge\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# RNN: sampling\n",
    "Once we've trained our network a bit, let's get to actually generating stuff. All we need is the `rnn_one_step` function you have written above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:26:55.341196Z",
     "start_time": "2018-08-13T20:26:55.323787Z"
    }
   },
   "outputs": [],
   "source": [
    "x_t = tf.placeholder(tf.int32, (1,))\n",
    "h_t = tf.Variable(np.zeros([1, rnn_num_units], np.float32))  # we will update hidden state in this variable\n",
    "\n",
    "# For sampling we need to define `rnn_one_step` tensors only once in our graph.\n",
    "# We reuse all parameters thanks to functional API usage.\n",
    "# Then we can feed appropriate tensor values using feed_dict in a loop.\n",
    "# Note how different it is from training stage, where we had to unroll the whole sequence for backprop.\n",
    "next_probs, next_h = rnn_one_step(x_t, h_t)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:26:55.346422Z",
     "start_time": "2018-08-13T20:26:55.342659Z"
    }
   },
   "outputs": [],
   "source": [
    "def generate_sample(seed_phrase=start_token, max_length=MAX_LENGTH):\n",
    "    '''\n",
    "    This function generates text given a `seed_phrase` as a seed.\n",
    "    Remember to include start_token in seed phrase!\n",
    "    Parameter `max_length` is used to set the number of characters in prediction.\n",
    "    '''\n",
    "    x_sequence = [token_to_id[token] for token in seed_phrase]\n",
    "    s.run(tf.assign(h_t, h_t.initial_value))\n",
    "    \n",
    "    # feed the seed phrase, if any\n",
    "    for ix in x_sequence[:-1]:\n",
    "         s.run(tf.assign(h_t, next_h), {x_t: [ix]})\n",
    "    \n",
    "    # start generating\n",
    "    for _ in range(max_length-len(seed_phrase)):\n",
    "        x_probs,_ = s.run([next_probs, tf.assign(h_t, next_h)], {x_t: [x_sequence[-1]]})\n",
    "        x_sequence.append(np.random.choice(n_tokens, p=x_probs[0]))\n",
    "        \n",
    "    return ''.join([tokens[ix] for ix in x_sequence if tokens[ix] != pad_token])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:26:58.458115Z",
     "start_time": "2018-08-13T20:26:55.347900Z"
    }
   },
   "outputs": [],
   "source": [
    "# without prefix\n",
    "for _ in range(10):\n",
    "    print(generate_sample())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:27:01.986726Z",
     "start_time": "2018-08-13T20:26:58.459810Z"
    }
   },
   "outputs": [],
   "source": [
    "# with prefix conditioning\n",
    "for _ in range(10):\n",
    "    print(generate_sample(' Trump'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Submit to Coursera"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:40:02.004926Z",
     "start_time": "2018-08-13T20:40:02.000821Z"
    }
   },
   "outputs": [],
   "source": [
    "# token expires every 30 min\n",
    "COURSERA_TOKEN = \"### YOUR TOKEN HERE ###\"\n",
    "COURSERA_EMAIL = \"### YOUR EMAIL HERE ###\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:40:18.923357Z",
     "start_time": "2018-08-13T20:40:03.549343Z"
    }
   },
   "outputs": [],
   "source": [
    "from submit import submit_char_rnn\n",
    "samples = [generate_sample(' Al') for i in tqdm_utils.tqdm_notebook_failsafe(range(25))]\n",
    "submission = (history, samples)\n",
    "submit_char_rnn(submission, COURSERA_EMAIL, COURSERA_TOKEN)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Try it out!\n",
    "\n",
    "__Disclaimer:__ This part of assignment is entirely optional. You won't receive bonus points for it. However, it's a fun thing to do. Please share your results on course forums.\n",
    "\n",
    "You've just implemented a recurrent language model that can be tasked with generating any kind of sequence, so there's plenty of data you can try it on:\n",
    "\n",
    "* Novels/poems/songs of your favorite author\n",
    "* News titles/clickbait titles\n",
    "* Source code of Linux or Tensorflow\n",
    "* Molecules in [smiles](https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system) format\n",
    "* Melody in notes/chords format\n",
    "* IKEA catalog titles\n",
    "* Pokemon names\n",
    "* Cards from Magic, the Gathering / Hearthstone\n",
    "\n",
    "If you're willing to give it a try, here's what you wanna look at:\n",
    "* Current data format is a sequence of lines, so a novel can be formatted as a list of sentences. Alternatively, you can change data preprocessing altogether.\n",
    "* While some datasets are readily available, others can only be scraped from the web. Try `Selenium` or `Scrapy` for that.\n",
    "* Make sure MAX_LENGTH is adjusted for longer datasets. There's also a bonus section about dynamic RNNs at the bottom.\n",
    "* More complex tasks require larger RNN architecture, try more neurons or several layers. It would also require more training iterations.\n",
    "* Long-term dependencies in music, novels or molecules are better handled with LSTM or GRU\n",
    "\n",
    "__Good hunting!__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# Bonus level: dynamic RNNs\n",
    "\n",
    "Apart from Keras, there's also a friendly TensorFlow API for recurrent neural nets. It's based around the symbolic loop function (aka [tf.scan](https://www.tensorflow.org/api_docs/python/tf/scan)).\n",
    "\n",
    "RNN loop that we implemented for training can be replaced with single TensorFlow instruction: [tf.nn.dynamic_rnn](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn).\n",
    "This interface allows for dynamic sequence length and comes with some pre-implemented architectures.\n",
    "\n",
    "Take a look at [tf.nn.rnn_cell.BasicRNNCell](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/BasicRNNCell)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:27:12.975354Z",
     "start_time": "2018-08-13T20:27:12.737529Z"
    }
   },
   "outputs": [],
   "source": [
    "class CustomRNN(tf.nn.rnn_cell.BasicRNNCell):\n",
    "    def call(self, input, state):\n",
    "        # from docs:\n",
    "        # Returns:\n",
    "        # Output: A 2-D tensor with shape [batch_size, self.output_size].\n",
    "        # New state: Either a single 2-D tensor, or a tuple of tensors matching the arity and shapes of state.\n",
    "        return rnn_one_step(input[:, 0], state)\n",
    "    \n",
    "    @property\n",
    "    def output_size(self):\n",
    "        return n_tokens\n",
    "    \n",
    "cell = CustomRNN(rnn_num_units)\n",
    "\n",
    "input_sequence = tf.placeholder(tf.int32, (None, None))\n",
    "    \n",
    "predicted_probas, last_state = tf.nn.dynamic_rnn(cell, input_sequence[:, :, None], dtype=tf.float32)\n",
    "\n",
    "print('LSTM outputs for each step [batch,time,n_tokens]:')\n",
    "print(predicted_probas.eval({input_sequence: to_matrix(names[:10], max_len=50)}).shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that we never used MAX_LENGTH in the code above: TF will iterate over however many time-steps you gave it.\n",
    "\n",
    "You can also use any pre-implemented RNN cell:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:27:12.981697Z",
     "start_time": "2018-08-13T20:27:12.977590Z"
    }
   },
   "outputs": [],
   "source": [
    "for obj in dir(tf.nn.rnn_cell) + dir(tf.contrib.rnn):\n",
    "    if obj.endswith('Cell'):\n",
    "        print(obj, end=\"\\t\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2018-08-13T20:27:13.168207Z",
     "start_time": "2018-08-13T20:27:12.986884Z"
    }
   },
   "outputs": [],
   "source": [
    "input_sequence = tf.placeholder(tf.int32, (None, None))\n",
    "\n",
    "inputs_embedded = embed_x(input_sequence)\n",
    "\n",
    "# standard cell returns hidden state as output!\n",
    "cell = tf.nn.rnn_cell.LSTMCell(rnn_num_units)\n",
    "\n",
    "state_sequence, last_state = tf.nn.dynamic_rnn(cell, inputs_embedded, dtype=tf.float32)\n",
    "\n",
    "s.run(tf.global_variables_initializer())\n",
    "\n",
    "print('LSTM hidden state for each step [batch,time,rnn_num_units]:')\n",
    "print(state_sequence.eval({input_sequence: to_matrix(names[:10], max_len=50)}).shape)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
